# SQL Test Data Generator [[💡 App ideas]] [[golang]] [[Software Testing]] **Github**: [GitHub - wesen/squeak: SQL Test Data Generator](https://github.com/wesen/squeak/tree/main) **Youtrack**: [squeak - Youtrack](https://wesen.youtrack.cloud/projects/5f728a0e-c12b-4517-91fe-96c8acbb0477) The idea is to build a small self-contained binary that can be used to quickly scaffold relatively complex [[SQL]] [[Relational Databases|database]] schemas and data ([[MySQL]], [[PostgreSQL]] or [[SQLite]]). This is useful for [[Unit tests]], and could potentially be extended to be used for [[Property Based Testing]]. > [!note] I want this library to [[Yak Shaving|shave a yak]], really. I want to have some reasonably complex test database to use for the tutorials I want to write for [[pgchameleon]]. > [!warning] 🧠 🧠 The writing below is all brainstorming and designing the application. Nothing is at the stage of being documentation yet. I am using the [[ZK - 2g1a - Design software by writing it out with words]] and [[ZK - 3d2 - Design systems using arrows and boxes]] ## General design ### 🧠 Brainstorm I want this to be a standard CLI application with - (viper +) cobra command parsing - should I have viper use the given file as a config file? - should the user be able to override parts of the file with command line flags? in this case viper might make sense - I think adding viper to the mix is maybe a bad idea - using [gofakeit](https://github.com/brianvoe/gofakeit) with is a [[faker]] for [[golang]] - not sure if I want to use a SQL builder, or if I should just use golang templates and code generation - besides generating full sqlite DB files, should we generate anything besides CREATE and INSERT raw SQL files? - do we want to deal with generating migrations (because that's interesting to execute as part of testing say, [[Idea - majipoor - no nonsense mysql to postgresql replication]]) ### Overall structure [[Idea - SQL Test Data Generator 2022-07-05 09.58.01.excalidraw]] ![[Idea - SQL Test Data Generator 2022-07-05 09.58.01.excalidraw.svg]] ### Input YAML format [[Idea - SQL Test Data Generator 2022-07-05 10.02.59.excalidraw]] ![[Idea - SQL Test Data Generator 2022-07-05 10.02.59.excalidraw.svg]] ## Schema generator Given an input [[YAML]] file (exact format will be determined while building), it will: - generate [[SQL]] `CREATE` statements - parse, interpret and cast data types - enums, numbers, strings, blobs - support NULL and default values - create foreign key relationships - allow constraints to be specified Verbs that are related to schema generator: - `create` (or maybe `generate` ?) (but later we might want say, `import` and `migrate`) - `generate` sounds like a better name, since we generate both the schema and the data - we would also need specific names to generate `UPDATE` and `DELETE` and `ALTER` statements. Maybe these would be specified separately in the [[YAML]] input file? - do we need anything else? it would be cool to inspect an existing `CREATE` statement and generate a YAML file from that, but that's for the future (especially seeing how complex CREATE statements can be). This could come in useful for generating migrations however. ### Simple example YAML file The first thing I want to try is just generate a 1000 users for SQLite. ```yaml tables: users: id: auto firstName: firstName lastName: lastName email: email generate: createTables: true tables: users: count: 1000 dialect: sqlite # this could be sql or db output: sql ``` ## Test data generation Once the schema has been generated, the YAML file will be used to specify how to generate test data: - using [[faker]] style generation for each field - arity indication for [[Relational Algebra|relational]] data - handle dates intelligently - compute DAG to know which data to insert first when foreign keys are at play - allow for more complex SQL [[Go templates|template]] [[Code Generation]] (if deemed necessary for creating say, complex inventory metrics) ## Bundle common tables Similar to [[faker]], the library will come with preconfigured "commonly used" tables: - users, products, addresses, posts, categories, etc... - generate a starter YAML to be further tweaked This will allow users to quickly scaffold a complex DB directly with a single [[Command Line|CLI]] command (no [[YAML]] file). ## See also - [[Scaffolding a database for unit testing]]